-
Notifications
You must be signed in to change notification settings - Fork 381
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
enable optimizer in tests and procmacros #5155
base: master
Are you sure you want to change the base?
enable optimizer in tests and procmacros #5155
Conversation
@willhickey @yihau please take a look |
cebe934
to
3a892cc
Compare
@@ -1830,7 +1823,6 @@ pub mod tests { | |||
{ | |||
let executable_bool: bool = account.executable(); | |||
assert!(!executable_bool); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed this assert as per advice from @roryharr
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
More Context: This assert was added a long time ago by ryoqun just to monitor some undefined behaviour.
The executable bit is a boolean which uses a byte of storage. The only valid values for the byte are zero and one. This is verified during account load by the function sanitize_executable.
This test corrupts the byte, and verifies that sanitize executable fails, which it does. Afterwards it has various asserts to see how it behaves in the event the sanitize check was not present. As per ryoquns comments these are purely for his interest.
When compiling with optimizations on, the compiler behaviour changes when accessing the byte by value.
u8 = unsafe { std::mem::transmute::<bool, u8>(executable_bool) }; returns the raw byte, rather than the underlying data converted to a bool, and then converted to a u8.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ryoqun In case you are still interested
@@ -1277,13 +1277,6 @@ pub mod tests { | |||
} | |||
} | |||
|
|||
fn get_executable_byte(&self) -> u8 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this function was only used in a test
Cargo.toml
Outdated
# Enable basic optimizations for unittests | ||
[profile.test] | ||
opt-level = 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One of the reasons unit tests do not have optimization enabled, is to speed up the compilation for interactive development.
When working on a single unit test, you want the fastest edit/compile/run cycle.
Here is an example of a command I would run when I'm working on a specific unit test:
cargo test \
--package solana-core \
--test epoch_accounts_hash \
--all-features -- \
--exact --nocapture test_snapshots_have_expected_epoch_accounts_hash
Have you considered the increase in the compilation time when switching from opt-level
0
to 1
for these use cases?
I think, a common approach for speed improvements of this kind for CI, is to build unit tests with optimizations in CI only.
cargo
supports custom profiles, so, I think, it might be possible to create a custom profile that CI will use, that would enable higher optimizations levels.
More details on the custom profiles can be found here:
https://doc.rust-lang.org/cargo/reference/profiles.html#custom-profiles
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
o1 is in many cases practically free in terms of compile times. For example:
running time cargo test --package solana-core --test epoch_accounts_hash
clean-build with optimizations (o1):
Executed in 177.98 secs fish external
usr time 76.21 mins 0.00 micros 76.21 mins
sys time 10.00 mins 839.00 micros 10.00 mins
clean-build with no optimizations (default settings):
Executed in 159.20 secs fish external
usr time 40.46 mins 837.00 micros 40.46 mins
sys time 8.29 mins 0.00 micros 8.29 mins
So, we would save 15 seconds of real build time (on full rebuild), while spending far longer waiting for the actual tests to run (assuming you are running more than one at a time). Some of our tests already take minutes to complete, especially the ones in local-cluster, and we only need to build stuff once to run all the tests. Full CI runs are almost always > 1 hour. o1 gives about 10x boost in speed compared to o0.
Another aspect here is that with optimizer disabled certain kinds of bugs such as race conditions may just never show up. Essentially we end up testing something other than what we run in prod in terms of actual generated assembly, and the differences can be quite dramatic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
o1 is in many cases practically free in terms of compile times. For example:
running
time cargo test --package solana-core --test epoch_accounts_hash
clean-build with optimizations (o1):
Executed in 177.98 secs fish external usr time 76.21 mins 0.00 micros 76.21 mins sys time 10.00 mins 839.00 micros 10.00 mins
clean-build with no optimizations (default settings):
Executed in 159.20 secs fish external usr time 40.46 mins 837.00 micros 40.46 mins sys time 8.29 mins 0.00 micros 8.29 mins
So, we would save 15 seconds of real build time (on full rebuild), while spending far longer waiting for the actual tests to run (assuming you are running more than one at a time).
When working on a single test, I would be running just this one test.
And if the compilation is 15 seconds slower, most of these 15 seconds will be added to my edit/compile/run cycle. The speed-up in a single test execution is normally unnoticeable.
Are you using a multithreaded linker, such as mold
?
Some of our tests already take minutes to complete, especially the ones in local-cluster
local-cluster
is the most extreme example.
And it is not a unit test, but an integration test.
This is why it is a separate crate that can have optimization configured for it separately.
we only need to build stuff once to run all the tests. Full CI runs are almost always > 1 hour. o1 gives about 10x boost in speed compared to o0.
This is the CI use case.
And I think speeding up our CI is a great cause.
But for the developer use case, you normally compile one crate and run one test.
And if unit tests are small and fast, you do not benefit from the speed-up.
But suffer from the compilation time increase.
Another aspect here is that with optimizer disabled certain kinds of bugs such as race conditions may just never show up. Essentially we end up testing something other than what we run in prod in terms of actual generated assembly, and the differences can be quite dramatic.
While it is great to be able to find UB issues, I am not sure if this is a good argument here.
I do not have a comprehensive statistics, but my guess is that depending on the UB you may or may not observe it at different optimization levels.
So, it could be the switch between them, that is fruitful.
Or, you could miss it in any case.
You can not really rely on the unit tests to catch the undefined behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
15 sec is for a full rebuild after cargo clean. while incrementally building one test the difference is not really measurable.
@@ -1277,13 +1277,6 @@ pub mod tests { | |||
} | |||
} | |||
|
|||
fn get_executable_byte(&self) -> u8 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change seems unrelated to the unit tests speed up.
Why not move it into a separate PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only reason this change is happening is to make optimizations possible for unittests. For more substantial changes separate PRs have been made by teams handling respective pieces of the code. I'll be making some as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this check is not really useful (as is claimed in the discussion) it can be done in a separate PR, as it has value by itself.
Making things simpler and avoiding UB, IIUC.
And merging smaller PRs is generally easier.
For example, should you end up introducing a separate compilation profile for the CI, this PR will increase in size and will take some time to land.
But this particular change might go in pretty quickly on its own.
It is up to you, of course. I was just speaking from personal experience :)
Problem
Summary of Changes